{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "view-in-github"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/documentation/transforms/python/elementwise/filter-py.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "view-the-docs-top"
   },
   "source": [
    "<table align=\"left\"><td><a target=\"_blank\" href=\"https://beam.apache.org/documentation/transforms/python/elementwise/filter\"><img src=\"https://beam.apache.org/images/logos/full-color/name-bottom/beam-logo-full-color-name-bottom-100.png\" width=\"32\" height=\"32\" />View the docs</a></td></table>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "id": "_-code"
   },
   "outputs": [],
   "source": [
    "#@title Licensed under the Apache License, Version 2.0 (the \"License\")\n",
    "# Licensed to the Apache Software Foundation (ASF) under one\n",
    "# or more contributor license agreements. See the NOTICE file\n",
    "# distributed with this work for additional information\n",
    "# regarding copyright ownership. The ASF licenses this file\n",
    "# to you under the Apache License, Version 2.0 (the\n",
    "# \"License\"); you may not use this file except in compliance\n",
    "# with the License. You may obtain a copy of the License at\n",
    "#\n",
    "#   http://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing,\n",
    "# software distributed under the License is distributed on an\n",
    "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
    "# KIND, either express or implied. See the License for the\n",
    "# specific language governing permissions and limitations\n",
    "# under the License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "filter"
   },
   "source": [
    "# Filter\n",
    "\n",
    "<script type=\"text/javascript\">\n",
    "localStorage.setItem('language', 'language-py')\n",
    "</script>\n",
    "\n",
    "<table align=\"left\" style=\"margin-right:1em\">\n",
    "  <td>\n",
    "    <a class=\"button\" target=\"_blank\" href=\"https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.Filter\"><img src=\"https://beam.apache.org/images/logos/sdks/python.png\" width=\"32px\" height=\"32px\" alt=\"Pydoc\"/> Pydoc</a>\n",
    "  </td>\n",
    "</table>\n",
    "\n",
    "<br/><br/><br/>\n",
    "\n",
    "Given a predicate, filter out all elements that don't satisfy that predicate.\n",
    "May also be used to filter based on an inequality with a given value based\n",
    "on the comparison ordering of the element."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "setup"
   },
   "source": [
    "## Setup\n",
    "\n",
    "To run a code cell, you can click the **Run cell** button at the top left of the cell,\n",
    "or select it and press **`Shift+Enter`**.\n",
    "Try modifying a code cell and re-running it to see what happens.\n",
    "\n",
    "> To learn more about Colab, see\n",
    "> [Welcome to Colaboratory!](https://colab.sandbox.google.com/notebooks/welcome.ipynb).\n",
    "\n",
    "First, let's install the `apache-beam` module."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "setup-code"
   },
   "outputs": [],
   "source": [
    "!pip install --quiet -U apache-beam"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "examples"
   },
   "source": [
    "## Examples\n",
    "\n",
    "In the following examples, we create a pipeline with a `PCollection` of produce with their icon, name, and duration.\n",
    "Then, we apply `Filter` in multiple ways to filter out produce by their duration value.\n",
    "\n",
    "`Filter` accepts a function that keeps elements that return `True`, and filters out the remaining elements."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "example-1-filtering-with-a-function"
   },
   "source": [
    "### Example 1: Filtering with a function\n",
    "\n",
    "We define a function `is_perennial` which returns `True` if the element's duration equals `'perennial'`, and `False` otherwise."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "example-1-filtering-with-a-function-code"
   },
   "outputs": [],
   "source": [
    "import apache_beam as beam\n",
    "\n",
    "def is_perennial(plant):\n",
    "  return plant['duration'] == 'perennial'\n",
    "\n",
    "with beam.Pipeline() as pipeline:\n",
    "  perennials = (\n",
    "      pipeline\n",
    "      | 'Gardening plants' >> beam.Create([\n",
    "          {\n",
    "              'icon': '🍓', 'name': 'Strawberry', 'duration': 'perennial'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🥕', 'name': 'Carrot', 'duration': 'biennial'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🍆', 'name': 'Eggplant', 'duration': 'perennial'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🍅', 'name': 'Tomato', 'duration': 'annual'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🥔', 'name': 'Potato', 'duration': 'perennial'\n",
    "          },\n",
    "      ])\n",
    "      | 'Filter perennials' >> beam.Filter(is_perennial)\n",
    "      | beam.Map(print))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "example-1-filtering-with-a-function-2"
   },
   "source": [
    "<table align=\"left\" style=\"margin-right:1em\">\n",
    "  <td>\n",
    "    <a class=\"button\" target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter.py\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" width=\"32px\" height=\"32px\" alt=\"View source code\"/> View source code</a>\n",
    "  </td>\n",
    "</table>\n",
    "\n",
    "<br/><br/><br/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "example-2-filtering-with-a-lambda-function"
   },
   "source": [
    "### Example 2: Filtering with a lambda function\n",
    "\n",
    "We can also use lambda functions to simplify **Example 1**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "example-2-filtering-with-a-lambda-function-code"
   },
   "outputs": [],
   "source": [
    "import apache_beam as beam\n",
    "\n",
    "with beam.Pipeline() as pipeline:\n",
    "  perennials = (\n",
    "      pipeline\n",
    "      | 'Gardening plants' >> beam.Create([\n",
    "          {\n",
    "              'icon': '🍓', 'name': 'Strawberry', 'duration': 'perennial'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🥕', 'name': 'Carrot', 'duration': 'biennial'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🍆', 'name': 'Eggplant', 'duration': 'perennial'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🍅', 'name': 'Tomato', 'duration': 'annual'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🥔', 'name': 'Potato', 'duration': 'perennial'\n",
    "          },\n",
    "      ])\n",
    "      | 'Filter perennials' >>\n",
    "      beam.Filter(lambda plant: plant['duration'] == 'perennial')\n",
    "      | beam.Map(print))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "example-2-filtering-with-a-lambda-function-2"
   },
   "source": [
    "<table align=\"left\" style=\"margin-right:1em\">\n",
    "  <td>\n",
    "    <a class=\"button\" target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter.py\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" width=\"32px\" height=\"32px\" alt=\"View source code\"/> View source code</a>\n",
    "  </td>\n",
    "</table>\n",
    "\n",
    "<br/><br/><br/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "example-3-filtering-with-multiple-arguments"
   },
   "source": [
    "### Example 3: Filtering with multiple arguments\n",
    "\n",
    "You can pass functions with multiple arguments to `Filter`.\n",
    "They are passed as additional positional arguments or keyword arguments to the function.\n",
    "\n",
    "In this example, `has_duration` takes `plant` and `duration` as arguments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "example-3-filtering-with-multiple-arguments-code"
   },
   "outputs": [],
   "source": [
    "import apache_beam as beam\n",
    "\n",
    "def has_duration(plant, duration):\n",
    "  return plant['duration'] == duration\n",
    "\n",
    "with beam.Pipeline() as pipeline:\n",
    "  perennials = (\n",
    "      pipeline\n",
    "      | 'Gardening plants' >> beam.Create([\n",
    "          {\n",
    "              'icon': '🍓', 'name': 'Strawberry', 'duration': 'perennial'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🥕', 'name': 'Carrot', 'duration': 'biennial'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🍆', 'name': 'Eggplant', 'duration': 'perennial'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🍅', 'name': 'Tomato', 'duration': 'annual'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🥔', 'name': 'Potato', 'duration': 'perennial'\n",
    "          },\n",
    "      ])\n",
    "      | 'Filter perennials' >> beam.Filter(has_duration, 'perennial')\n",
    "      | beam.Map(print))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "example-3-filtering-with-multiple-arguments-2"
   },
   "source": [
    "<table align=\"left\" style=\"margin-right:1em\">\n",
    "  <td>\n",
    "    <a class=\"button\" target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter.py\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" width=\"32px\" height=\"32px\" alt=\"View source code\"/> View source code</a>\n",
    "  </td>\n",
    "</table>\n",
    "\n",
    "<br/><br/><br/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "example-4-filtering-with-side-inputs-as-singletons"
   },
   "source": [
    "### Example 4: Filtering with side inputs as singletons\n",
    "\n",
    "If the `PCollection` has a single value, such as the average from another computation,\n",
    "passing the `PCollection` as a *singleton* accesses that value.\n",
    "\n",
    "In this example, we pass a `PCollection` the value `'perennial'` as a singleton.\n",
    "We then use that value to filter out perennials."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "example-4-filtering-with-side-inputs-as-singletons-code"
   },
   "outputs": [],
   "source": [
    "import apache_beam as beam\n",
    "\n",
    "with beam.Pipeline() as pipeline:\n",
    "  perennial = pipeline | 'Perennial' >> beam.Create(['perennial'])\n",
    "\n",
    "  perennials = (\n",
    "      pipeline\n",
    "      | 'Gardening plants' >> beam.Create([\n",
    "          {\n",
    "              'icon': '🍓', 'name': 'Strawberry', 'duration': 'perennial'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🥕', 'name': 'Carrot', 'duration': 'biennial'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🍆', 'name': 'Eggplant', 'duration': 'perennial'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🍅', 'name': 'Tomato', 'duration': 'annual'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🥔', 'name': 'Potato', 'duration': 'perennial'\n",
    "          },\n",
    "      ])\n",
    "      | 'Filter perennials' >> beam.Filter(\n",
    "          lambda plant,\n",
    "          duration: plant['duration'] == duration,\n",
    "          duration=beam.pvalue.AsSingleton(perennial),\n",
    "      )\n",
    "      | beam.Map(print))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "example-4-filtering-with-side-inputs-as-singletons-2"
   },
   "source": [
    "<table align=\"left\" style=\"margin-right:1em\">\n",
    "  <td>\n",
    "    <a class=\"button\" target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter.py\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" width=\"32px\" height=\"32px\" alt=\"View source code\"/> View source code</a>\n",
    "  </td>\n",
    "</table>\n",
    "\n",
    "<br/><br/><br/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "example-5-filtering-with-side-inputs-as-iterators"
   },
   "source": [
    "### Example 5: Filtering with side inputs as iterators\n",
    "\n",
    "If the `PCollection` has multiple values, pass the `PCollection` as an *iterator*.\n",
    "This accesses elements lazily as they are needed,\n",
    "so it is possible to iterate over large `PCollection`s that won't fit into memory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "example-5-filtering-with-side-inputs-as-iterators-code"
   },
   "outputs": [],
   "source": [
    "import apache_beam as beam\n",
    "\n",
    "with beam.Pipeline() as pipeline:\n",
    "  valid_durations = pipeline | 'Valid durations' >> beam.Create([\n",
    "      'annual',\n",
    "      'biennial',\n",
    "      'perennial',\n",
    "  ])\n",
    "\n",
    "  valid_plants = (\n",
    "      pipeline\n",
    "      | 'Gardening plants' >> beam.Create([\n",
    "          {\n",
    "              'icon': '🍓', 'name': 'Strawberry', 'duration': 'perennial'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🥕', 'name': 'Carrot', 'duration': 'biennial'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🍆', 'name': 'Eggplant', 'duration': 'perennial'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🍅', 'name': 'Tomato', 'duration': 'annual'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🥔', 'name': 'Potato', 'duration': 'PERENNIAL'\n",
    "          },\n",
    "      ])\n",
    "      | 'Filter valid plants' >> beam.Filter(\n",
    "          lambda plant,\n",
    "          valid_durations: plant['duration'] in valid_durations,\n",
    "          valid_durations=beam.pvalue.AsIter(valid_durations),\n",
    "      )\n",
    "      | beam.Map(print))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "example-5-filtering-with-side-inputs-as-iterators-2"
   },
   "source": [
    "<table align=\"left\" style=\"margin-right:1em\">\n",
    "  <td>\n",
    "    <a class=\"button\" target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter.py\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" width=\"32px\" height=\"32px\" alt=\"View source code\"/> View source code</a>\n",
    "  </td>\n",
    "</table>\n",
    "\n",
    "<br/><br/><br/>\n",
    "\n",
    "> **Note**: You can pass the `PCollection` as a *list* with `beam.pvalue.AsList(pcollection)`,\n",
    "> but this requires that all the elements fit into memory."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "example-6-filtering-with-side-inputs-as-dictionaries"
   },
   "source": [
    "### Example 6: Filtering with side inputs as dictionaries\n",
    "\n",
    "If a `PCollection` is small enough to fit into memory, then that `PCollection` can be passed as a *dictionary*.\n",
    "Each element must be a `(key, value)` pair.\n",
    "Note that all the elements of the `PCollection` must fit into memory for this.\n",
    "If the `PCollection` won't fit into memory, use `beam.pvalue.AsIter(pcollection)` instead."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "example-6-filtering-with-side-inputs-as-dictionaries-code"
   },
   "outputs": [],
   "source": [
    "import apache_beam as beam\n",
    "\n",
    "with beam.Pipeline() as pipeline:\n",
    "  keep_duration = pipeline | 'Duration filters' >> beam.Create([\n",
    "      ('annual', False),\n",
    "      ('biennial', False),\n",
    "      ('perennial', True),\n",
    "  ])\n",
    "\n",
    "  perennials = (\n",
    "      pipeline\n",
    "      | 'Gardening plants' >> beam.Create([\n",
    "          {\n",
    "              'icon': '🍓', 'name': 'Strawberry', 'duration': 'perennial'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🥕', 'name': 'Carrot', 'duration': 'biennial'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🍆', 'name': 'Eggplant', 'duration': 'perennial'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🍅', 'name': 'Tomato', 'duration': 'annual'\n",
    "          },\n",
    "          {\n",
    "              'icon': '🥔', 'name': 'Potato', 'duration': 'perennial'\n",
    "          },\n",
    "      ])\n",
    "      | 'Filter plants by duration' >> beam.Filter(\n",
    "          lambda plant,\n",
    "          keep_duration: keep_duration[plant['duration']],\n",
    "          keep_duration=beam.pvalue.AsDict(keep_duration),\n",
    "      )\n",
    "      | beam.Map(print))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "example-6-filtering-with-side-inputs-as-dictionaries-2"
   },
   "source": [
    "<table align=\"left\" style=\"margin-right:1em\">\n",
    "  <td>\n",
    "    <a class=\"button\" target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter.py\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" width=\"32px\" height=\"32px\" alt=\"View source code\"/> View source code</a>\n",
    "  </td>\n",
    "</table>\n",
    "\n",
    "<br/><br/><br/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "related-transforms"
   },
   "source": [
    "## Related transforms\n",
    "\n",
    "* [FlatMap](https://beam.apache.org/documentation/transforms/python/elementwise/flatmap) behaves the same as `Map`, but for\n",
    "  each input it might produce zero or more outputs.\n",
    "* [ParDo](https://beam.apache.org/documentation/transforms/python/elementwise/pardo) is the most general elementwise mapping\n",
    "  operation, and includes other abilities such as multiple output collections and side-inputs.\n",
    "\n",
    "<table align=\"left\" style=\"margin-right:1em\">\n",
    "  <td>\n",
    "    <a class=\"button\" target=\"_blank\" href=\"https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.Filter\"><img src=\"https://beam.apache.org/images/logos/sdks/python.png\" width=\"32px\" height=\"32px\" alt=\"Pydoc\"/> Pydoc</a>\n",
    "  </td>\n",
    "</table>\n",
    "\n",
    "<br/><br/><br/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "view-the-docs-bottom"
   },
   "source": [
    "<table align=\"left\"><td><a target=\"_blank\" href=\"https://beam.apache.org/documentation/transforms/python/elementwise/filter\"><img src=\"https://beam.apache.org/images/logos/full-color/name-bottom/beam-logo-full-color-name-bottom-100.png\" width=\"32\" height=\"32\" />View the docs</a></td></table>"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "name": "Filter - element-wise transform",
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "python3",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
